The GENIE System: Classifying Documents by Combining Mixed-Techniques
نویسندگان
چکیده
Today, the automatic text classification is still an open problem and its implementation in companies and organizations with large volumes of data in text format is not a trivial matter. To achieve optimum results many parameters come into play, such as the language, the context, the level of knowledge of the issues discussed, the format of the documents, or the type of language that has been used in the documents to be classified. In this paper we describe a multi-language rule-based pipeline system, called GENIE, used for automatic document categorisation. We have used several business corpora in order to test the real capabilities of our proposal, and we have studied the results of applying different stages of the pipeline over the same data to test the influence of each step in the categorization process. The results obtained by this system are very promising, and in fact, the GENIE system is already being used on real production environments with very good results.
منابع مشابه
Automated Simultaneous Multiple Feature Classification of MTI Data
Los Alamos National Laboratory has developed and demonstrated a highly capable system, GENIE, for the twoclass problem of detecting a single feature against a background of non-feature. In addition to the two-class case, however, a commonly encountered remote sensing task is the segmentation of multispectral image data into a larger number of distinct feature classes or land cover types. To thi...
متن کاملUsing Fuzzy LR Numbers in Bayesian Text Classifier for Classifying Persian Text Documents
Text Classification is an important research field in information retrieval and text mining. The main task in text classification is to assign text documents in predefined categories based on documents’ contents and labeled-training samples. Since word detection is a difficult and time consuming task in Persian language, Bayesian text classifier is an appropriate approach to deal with different...
متن کاملUsing Fuzzy LR Numbers in Bayesian Text Classifier for Classifying Persian Text Documents
Text Classification is an important research field in information retrieval and text mining. The main task in text classification is to assign text documents in predefined categories based on documents’ contents and labeled-training samples. Since word detection is a difficult and time consuming task in Persian language, Bayesian text classifier is an appropriate approach to deal with different...
متن کاملUniversity education strategies wisdom-based University
Introduction: Identifying effective educational strategies in developing the skills required by academics with an Islamic approach is one of the basic and urgent needs of the country's academic community. In this regard, the present study uses a descriptive-analytical method based on library information and documents, while conceptualizing the term Wisdom and wisdom based university, with the a...
متن کاملEvolving forest fire burn severity classification algorithms for multi-spectral imagery
Between May 6 and May 18, 2000, the Cerro Grande/Los Alamos wildfire burned approximately 43,000 acres (17,500 ha) and 235 residences in the town of Los Alamos, NM. Initial estimates of forest damage included 17,000 acres (6,900 ha) of 70-100% tree mortality. Restoration efforts following the fire were complicated by the large scale of the fire, and by the presence of extensive natural and man-...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014